Spike trains analysis basics

Table of Contents

1. Classical descriptive statistics

Spike trains: Stationary regime

After the 'rather heavy' pre-processing stage called spike sorting, the raster plot representing the spike trains can be built:


One spike train: Intervals statistics


Fig. 3 of Perkel, Gerstein and Moore (1967a), DOI: 10.1016/S0006-3495(67)86596-2.

Basic characterization: ISI density estimate


Inter spike intervals (ISI) density estimates for 7 simultaneously recorded neurons.

Several trains: Lagged counts statistics \(\to\) Cross-correlogram

Consider the spike trains of 2 neurons A and B: \[\{t_1^A,t_2^A,\ldots\} \text{ and } \{t_1^B,t_2^B,\ldots\}\]

Define the Cross-correlogram by: \[\widehat{cc}_{AB}(u) = \frac{\left|\left\{u-\delta/2 < t_i^B-t_j^A < u+\delta/2;\, t_i^B \neq t_j^A, i,j=1,2,\ldots\right\}\right|}{\delta{}N^A(]0,T])}\;,\] where \(\left|\{\;\}\right|\) stands for the number of elements of a set, \(N^A(]0,T])\) stands for the number of spikes from A between times 0 and \(T\) and \(\delta\) is a (small) bin size.

\vspace{0.5cm} After Brillinger, Bryant and Segundo (1976) DOI: 10.1007/BF00365087.



Fig. 2 of Moore, Segundo, Perkel and Levitan (1970), DOI: 10.1016/S0006-3495(70)86341-X.

Interpretation issue 1 (Moore et al, 1970)


Interpretation issue 2 (Moore et al, 1970)


Non-stationary regime: odor responses


Raster plots of the 3 recorded neurons from the cockroach antennal lobe. Citronellal is puffed during 0.5 s starting from vertical red line. 1 sec is shown before stimulus onset and 4 sec are shown after. The first stimulation is at the bottom. Is neuron 1 responding to the stimulation? Cockroach (Periplaneta americana) recordings and spike sorting by Antoine Chaffiol.

Classical statistics: PSTH


The Post-Stimulus Time Histogram (PSTH).

2. Approaches based on stochastic models: why?

Membrane noise can lead to output fluctuations

  • We now know something that Hodgkin and Huxley did not know when they introduced their famous model: the sodium and potassium conductances responsible for the action potential are due to voltage-gated ion channels that are pore forming transmembrane proteins.
  • As most macromolecules, ion channels can be found in several transient states and will switch spontaneously between those states.
  • A change in membrane voltage will change the proportion of time they spent in the different states.
  • In the case of voltage-gated sodium channels spontaneous transitions are observed between 'closed' and 'open' states.


Figures 1 and 2 of Sigworth and Neher (1980). The first single channel recordings from sodium channels. Left, experimental setting. Right, single channel currents during 10 mV depolarizations: a, imposed patch membrane potential (40 mV pulse amplitude), one pulse was applied every second; b, average of a set of 300 responses; c, nine successive individual records.

  • Voltage-gated sodium channels have a small conductance (5-10 pS) and cases where the spontaneous opening of a single channel can give rise to a full action potential are rare.
  • It can only occur in very tiny neurons that have therefore an input resistance large enough to enable a tiny current (through a tiny conductance) to bring a membrane potential change large enough to reach threshold.


The stochastic opening of channels can lead, in some cases at least, to fluctuating / jittering spike times as was demonstrated (before channels were known) by Verveen and Derksen (1968)


Montage of Figures 1, 2 and 3 of Verveen and Derksen.

Fluctuations at the neuromuscular junction

  • In 1952, Fatt and Katz reported the observation of small spontaneous potentials from the end-plate region of the frog neuromuscular junction.
  • They named those miniature end-plate potentials (mepp).
  • They argued that these potentials originate from the spontaneous release of transmitter from the nerve terminal.
  • They then studied the response of the same muscle fiber to presynaptic nerve stimulation in a low extracellular calcium condition.
  • They saw that these reduced evoked responses fluctuate in steps whose amplitudes matched the ones of spontaneous potentials.


Top left: spontaneous activity recorded intracellularly from a frog muscle fiber close to the end-plate. Bottom left: response to a presynaptic nerve stimulation recorded from the same location (lower magnification on the \(y\) scale). Top right trace: 50 Hz oscillation giving a time reference. Middle right 3 traces: spontaneous potentials. Bottom right trace: superposed evoked responses recorded in low calcium; the presynaptic stimulation time is indicated by the arrow.

  • In the same paper, Fatt and Katz studied the distribution of the intervals between successive mepp and showed it to be compatible with an exponential distribution.
  • This is a first indication that these miniature events follow a homogeneous Poisson process.
  • These inter mepp interval data can be found in the Appendix of Cox and Lewis' book where they are wrongly described as inter spike interval data.

  • It should be noted that this observation of the mepp and the suggestion that 'normal' evoked end-plate potentials are made of a bunch of mepp predates the observation, using electron-microscopy, of what we now call synaptic vesicles in the presynaptic terminal.
  • De Robertis and Bennett in 1955 and Palay in 1956, describing the synaptic vesicles, were the first to suggest that a mepp is due to the fusion of a single vesicle.


Empirical and fitted distribution functions of the inter mepp intervals. The dots are the data and the continuous line the fitted exponential distribution function. A series of 800 mepp was used, the shortest interval that could be resolved was 5 ms, the mean interval was 221 ms (Fatt & Katz, 1952).

Quantal neurotransmitter release

  • In 1954, Del Castillo and Katz investigated systematically the 'composite nature' of evoked end-plate potentials.
  • Their conclusion are best summarized by the title they chose for their paper: Quantal components of the end-plate potential.
  • In high magnesium conditions–that reduces release since, as we now know, it blocks calcium channels–, not only could they reproduce Fatt and Katz observation of fluctuating evoked potentials but they could also observe transmission failures.


Left, Fluctuating evoked end-plate potentials in high magnesium (note the scattered mepp). Right (high magnesium and low calcium conditions): top, mepp; middle, 50 Hz cycles for time reference; bottom, evoked responses with many 'failures'; stimulus artifact and response latency are indicated by a pair of dotted vertical lines.

Comparing the mepp amplitude distribution with the one of the evoked potentials in low calcium conditions, they proposed the following scenario, commonly referred to as the quantal neurotransmitter release:

  • the presynaptic terminal contains \(n\) vesicles;
  • when a presynaptic action potential invades the terminal, each of the \(n\) vesicle will independently of the others release the transmitter it contains with a probability \(p\) (\(p\) depends on the extracellular calcium and magnesium concentrations);
  • each vesicle that releases its content gives rise to an elementary end-plate potential whose distribution is the same as the one of the mepp;
  • the evoked end-plate potential is the sum of the individual potentials due to each vesicle that released its transmitters.

Under this scenario/hypothesis the evoked potential should follow a binomial distribution–or, if \(p\) is small and \(n\) is large, a Poisson distribution with parameter \textit{np}–corrupted by noise (from an increasing number of normally distributed mepp) as is indeed observed:


What about the central nervous system?

  • These studies carried out at 'the' neuromuscular junction–a synapse designed to be reliable–exhibit marked fluctuations in specific conditions (low calcium or high magnesium) while in normal conditions the fluctuations can be neglected: every time a presynaptic spike arrives, muscle contraction ensues.
  • In the central nervous system, synapses typically have a small number of vesicles (\(n\)) and the release probability (\(p\)) is rarely close to 1.
  • Marked fluctuations are therefore the rule, even in physiological conditions.


Left, direct excitatory connection (Miles and Wong, 1986); right, direct inhibitory connection (Miles and Wong, 1984). CA3 region of the guinea-pig hippocampus.


  • We are facing now a modeling problem: where and how shall we incorporate 'stochastic components' in our neuron models?
  • There are clearly several possibilities:
    • one would be to stick to the 'detailed biophysical model' tradition and take a 'usual' model and for instance, replace the Hodgkin and Huxley conductance model by a Markov process one;
    • the somewhat 'opposite' approach would be to 'lump together' the known and unknown sources of variability in a single component of the model;
    • this is the way we will pursue in the sequel and we are going to start by reviewing some of the solutions that have been proposed along this line.

3. Approaches based on stochastic models: examples

Fitzhugh (1958): background

In the mid-fifties, Kuffler, Fitzhugh and Barlow start studying the retina, concentrating on the output neurons, the ganglion cells. They describe different types of cells, on and off cells and show that when the stimulus intensity is changed, the cells exhibit a transient response before reaching a maintained discharge that is stimulus independent as shown if their figure 5, reproduced here:


They then make a statistical description of the maintained discharge that looks like:


They propose a gamma distribution model for the inter spike intervals for the 'maintained discharge regime', we would say the stationary regime:


Building a discharge model for the transient regime

Fitzhugh wants first to build a discharge model for the transient parts of the neuron responses; his model boils down to:

  • The neuron has its own 'clock' whose rate is influenced by the stimulus that can increase it or decrease it (as long as the rate remains nonnegative).
  • The neuron always discharge following an renewal process with a fixed gamma distribution with respect to its own time (the time given by its clock).
  • Since the clock rate can be modified by a stimulus, the neuron time can be 'distorted'–the distortion is given by the integral of the rate–with respect to the 'experimental time' and the spike times generated by the neuron can be nonstationary and exhibit correlations.

Fitzhugh illustrates the working of his model by assuming the following rate (clock) function \(f\):


The way to generate the observed spike train assuming this rate function is illustrated next:



  • Fitzhugh proposes to estimate \(f'\) from what we would now call a normalized version of the Peri-Stimulus Time Histogram.
  • The normalization is done by setting the rate in the stationary regime to 1.


  • Fitzhugh approach could obviously be generalized to interactions between neurons.
  • Synaptic coupling could translate into acceleration (for excitatory synapses) and deceleration (for inhibitory ones) of the clock rate.
  • It is strange that no one apparently explored these possibilities further.
  • There is moreover a close connection to what statisticians call accelerated failure time model.

Gerstein & Mandelbrot (1964): background

Using data from the cochlear nucleus of cats, Gerstein and Mandelbrot in their 1964 paper try to account for two data features:

  1. The spike trains in the stationary regime are well approximated by renewal processes and the distribution of the sums of 2, 3 or more intervals looks sometimes like a scaled version of the distribution of a single interval convolved with itself twice, three times, etc; in other words the interval distribution might be a stable distribution.
  2. The interval probability density functions typically exhibit a marked asymmetry (fast rise and sometimes very slow decay), even stronger than the ones fitted with gamma distribution by Kuffler, Fitzhugh and Barlow


  • To get a model with 'heavy tails' and with parameters that can be interpreted physiologically, they consider the dynamics of the membrane potential as driven by constant bombardment of not directly observed excitatory and inhibitory inputs:
    • an excitatory input 'gives an upward kick' to the membrane potential
    • an inhibitory one gives a 'downward kick'.
  • Assuming:

    • a large, virtually infinite, membrane time constant;
    • a large number of uncorrelated presynaptic neurons leading to individual inputs of small amplitudes;

    the membrane potential is expected to follow a random walk converging to a Wiener process/Brownian motion process (as the number of presynaptic neurons goes to infinity and the as the input amplitude goes to 0).

  • If one sets a threshold such that every time the membrane potential reaches this threshold:

    • an action potential is emitted
    • the membrane potential is reset to a standard value,

    one gets a stochastic model of the discharge.

  • The first 'problem' of interest for us is the determination of distribution of the first passage time (FPT) or first hitting time through the threshold, since these FPT are the inter spike intervals.

In situations where the excitatory inputs dominate, the membrane potential exhibits a drift:


In the paper Gerstein and Mandelbrot justify the FPT probability density of their random-walk-plus-drift model. This FPT distribution is an inverse Gaussian distribution also known as a Wald distribution; its expression is:

\begin{equation*} f_{FPT}(t) = \sqrt{\frac{\lambda}{2 \pi{} t^3}} \exp \left(-\frac{\lambda{}(t-\mu)^2}{2 \mu^2 t}\right), \quad T, \lambda, \mu > 0, \end{equation*}


  • \(\lambda = r^2/\sigma^2\) (\(r\) is the distance in [mV] between reset and threshold
  • \(\sigma^2\) is the diffusion constant of the Brownian motion process [\(\text{mV}^2/\text{ms}\)])
  • \(\mu=r/\delta\) (\(\delta\) is the membrane potential drift [mV/ms]).

They present comparison between histogram based FPT densities estimation and fitted inverse-Gaussian densities:



  • This model and some of its variants have enjoyed a big popularity among 'diffusion expert' probabilists (Ricciardi, Sacerdote).
  • Again, synaptic interactions can be accommodated by allowing specific inputs to modify the drift in a deterministic fashion.
  • A problem appears when we want to do statistical inference in a general context (the drift depends on synaptic inputs or on stimulation):
    • We then want to get the likelihood implying that we need to compute the probability density of each inter spike interval, many many times (for different parameter values).

  • Simulating diffusion explicitly is not the best solution; it is slow and error prone.
  • It is much more efficient to keep the drift at zero and to make the threshold change (with the opposite magnitude of the original drift).
  • We are then looking for the FPT of a Brownian motion (without drift) across a time dependent boundary and there are very efficient numerical methods–with bounded errors–for that.

Chornoboy, Schramm and Karr (1988): overview

  • This article focuses on the estimation of the synaptic coupling between neurons (from observed spike trains).
  • It is not based on actual data.
  • A model is presented in a very detailed way:
    • inference is strongly emphasized
    • a lot of formal properties of the estimator are proven,
    • an estimation algorithm, whose convergence is also proven, is presented.
  • Then simulations are used to check the performances of the method and to compare them with alternative methods available at that time.


  • The authors model directly the rate or intensity function.
  • Their rate function is a Hawkes process rate function.
  • More precisely, they consider:
    • a network made of \(J\) neurons,
    • the counting process of \(j\) is \(\{N_j(t), t \ge 0\}\),
    • the counting processes interact and their interaction is modeled by the convolution of a kernel, \(h_{j \rightarrow i}^1\) specific of each neuron pair and of the differential of the presynaptic neuron (\(j\)) counting process,

(Clarification: the differential of a counting procees)

'The convolution of a kernel and of the differential of the presynaptic neuron (\(j\)) counting process' means:

  • The effect of \(j\) on the rate function of \(i\) at time \(t\) takes the form: \[\int_0^{t^{-}} h_{j \rightarrow i}^1(t-u) dN_j(u)\]
  • For a given spike train \(\{t_1^j,t_2^j,\ldots\}\) of neuron \(j\), this integral translates into: \[\sum_{t_k^j

(back to the model)

The rate function of neuron \(i\) is then given by:

\begin{equation*} \lambda_i(t) = h_i^0 + \sum_{j=1}^J \int_0^{t^{-}} h_{j \rightarrow i}^1(t-u) dN_j(u)\, , \end{equation*}

where you see that:

  • a sum is made on all the neurons of the network, including \(i\) itself;
  • the individual effects (of each spike from each neuron) sum linearly;
  • the term \(h_i^0\), the 'spontaneous rate' would make the process a Poisson process if all the \(h_{j \rightarrow i}^1\) were uniformly 0.

An obvious weakness of this model is that since the rates \(\lambda_i\) must be nonnegative, inhibitory interactions are de facto out of the picture.


They maximize the log-likelihood that takes the following form assuming observation between 0 and \(T\):\[\mathcal{L}(\theta) = \sum_{i=1}^J \mathcal{L}_i(\theta_i),\] with \[\mathcal{L}_i(\theta_i) = \sum_{k=1}^{n_i} \log\left(\lambda_i(t_k^i;\theta_i)\right) \exp\left(-\int_0^T \lambda_i(u;\theta_i) du\right),\] where:

  • the \(\{t_1^i,\ldots,t_{n_i}^i\}\), are the spikes of neuron \(i\)
  • \(\theta_i = \{h_i^0,h_{1 \rightarrow i}^1,\ldots,h_{J \rightarrow i}^1\}\) are the 'parameters' of neuron \(i\)
  • \(\lambda_i(u;\theta_i)\) is given by our last equation
  • \(\theta = \{\theta_1,\ldots,\theta_J\}\).

  • Clearly, the \(J^2\) kernels \(h_{j \rightarrow i}^1\) cannot be arbitrary functions.
  • Otherwise inference would be impossible since the number of parameters to estimate would be infinite.
  • So Chornoboy et al. restrict the kernel to a class of piecewise constant functions (or step functions) with bounded support (\([0,640]\) in ms) and 64 pieces (each piece 10 ms long), as illustrated below:


  • The authors develop an Expectation-maximization (EM) algorithm for estimating the \(J + J^2 \times 64\) parameters of their model.
  • An advantage of this algorithm is that the nonnegativity of the parameters is automatically satisfied.


The authors consider first:

  • a network made of 2 (\(J=2\)) neurons, one, say neuron 1, presynaptic to the other one, neuron 2.
  • The effect of 1 on 2, \(h_{1 \rightarrow 2}\) is given by the kernel just shown.
  • They then generate spike trains of neuron 1 with several schemes:
    • one described as 'over-clustered',
    • the other described as 'under-clustered'
  • The goal here is to move away from the homogeneous Poisson assumption made by the available methods
  • They simulate the postsynaptic train given the presynaptic one and the interaction kernel.
  • They then run their inference method together with the method of choice a that time they name 'method of moments'.

For the 'over-clustered' case they get:


For the 'under-clustered' case they get:


  • In both cases, the proposed method is clearly better.
  • This is not that surprising since the data are then fitted with the model that generated them in the model family considered.

They consider next two neurons reciprocally connected with the same kernel and compare again the two methods:


The next example involves three neurons: 1 'talks' to 2 that 'talks' to 3, but 1 does not 'talk' directly to 3. The question is: does the proposed method find that there is no direct connection between 1 and 3?


And the answer is yes!

The last example involves three neurons again with 1 talking to 2 and 3 (common input) without direct connection between 2 and 3. Again the question is: does the proposed method find that there is no direct connection between 2 and 3?


Again, the method works.


  • As discussed in the paper, inhibition can be dealt with if one 'rectifies' \(\ambda\), working for instance with: \[\lambda_i(t) = \exp \left(h_i^0 + \sum_{j=1}^J \int_0^{t^{-}} h_{j \rightarrow i}^1(t-u) dN_j(u)\right)\, .\]
  • The problem is then that the nice EM algorithm developed by the authors does not work anymore (but more general ones can be used).
  • For an alternative handling of inhibition see Duval, Luçon and Pouzat (20222).

  • Another shortcoming of the model is the difficulty to handle a stimulus.
  • Indeed, in the proposed framework, one would also model the stimulus by a piecewise function.
  • But ones runs then into nasty unidentifiability problems because of the interplay between \(h_i^0\) and the stimulus baseline (the easy solution is to force that baseline at 0).
  • All the nice examples shown in the paper are nice because the whole network is observed.
    • In real life, that is never the case and that can lead to estimating wrong connections; exactly as the method of moments does in the last example.
  • This paper was essentially ignored for 15-17 years, but has recently enjoyed a rebirth (Reynaud-Bouret, Hodara, Lambert).
  • The Hawkes process is now also heavily used for the modeling of financial transactions…

Brillinger (1988): overview

  • This last article is rarely cited although it had the most profound influence on spike train analysis as it is performed today.
  • On the surface it is the follow up and extension of a former work of Brillinger himself together with his long time collaborator José Segundo on Aplysia ganglion (this explains why only pairs of neurons are explicitly considered).


Brillinger starts by transforming spike trains into binary vectors (with 0 in bins/vector elements that contain no spike and 1 in bins that contain one spike).

For instance:

  • if the time unit is the millisecond
  • if the spike train for the first 10 ms is \((2.23,7.53)\)

then using a 1 ms bin width and starting binning from 0 ms we get:

\begin{align*} \text{Index set } & \{0,1,2,3,4,5,6,7,8,9,\ldots\} \\ \text{Counting process sample path } & \{0,0,1,1,1,1,1,2,2,2,\ldots\} \\ \text{Point process sample path } & \{0,0,1,0,0,0,0,1,0,0,\ldots\} \end{align*}

  • From a modeling viewpoint we are then dealing with a sequence of Bernoulli trials, one per bin.
  • That is, in bin \(t\) a biased coin is tossed with a probability \(p(t)\) to fall on head (give a spike) and \(1-p(t)\) to fall on tail (no spike).
  • From a statistical viewpoint, the problem of estimating the \(\{p(t)\}\) is called a binary regression.
  • Time discretization is in fact necessary to end's up with a generalized linear model, the now famous GLM that almost everyone mentions in neuroscience without being able to explain what it is or what it means…

  • To be more explicit, Brillinger writes \(\{X_C(t),t \ge 0\}\), respectively \(\{X_B(t),t \ge 0\}\), the counting process (associated to the spike train) of \(C\), respectively \(B\).
  • The probability of the spike train of \(C\) can then be written (assuming that observations are available between 0 and \(T\)): \[\prod_{t=0}^T p_C(t)^{X_C(t)} \left(1-p_C(t)\right)^{1-X_C(t)}\,.\]

The functional form of \(p_C(t)\) adopted by Brillinger is:

\begin{align*} p_C(t) = \Phi{} (&\sum_{k=L_C(t)+1}^t h_k X_B(t-k) + \alpha_1 \left(t-L_C(t)-1\right) \\ &\quad {} + \alpha_2 \left(t-L_C(t)-1\right)^2 + \alpha_3 \left(t-L_C(t)-1\right)^3 - \alpha_0 )\,, \end{align*}


  • \(L_C(t)\) stands for the index of the last spike of neuron \(C\), prior to \(t\) (\(L_C(t)= \sup \{j
  • the \(\{h_k\}\) describe the postsynaptic effect in neuron \(C\) at time \(t\) of a spike that occurred at time \(t-k\) in neuron \(B\);
  • the \(\{\alpha_0,\alpha_1,\alpha_2,\alpha_3\}\) describe intrinsic properties of neuron \(C\) (they can be viewed either as a variable threshold or as a self-inhibitory effect),
  • \(\Phi\) is the standard normal distribution function.

We could make this formulation much more similar to the one of Chornoboy et al. by writing for the first term on the right-hand side: \[\sum_{k=L_C(t)+1}^t h_k X_B(t-k) = \int_{L_C(t)+1}^t h_{B \rightarrow C}^1(t-u) dN_B(u),\] where \(h_{B \rightarrow C}^1\) would also be piecewise constant; for the 'second' term: \[\sum_{i=1}^3\alpha_i \left(i-L_C(t)\right)^i = \int_{L_C(t)+1}^t h_{C \rightarrow C}^1(t-u) dN_C(u) \] where \(h_{C \rightarrow C}^1(x)\) would be \(\alpha_1 + 2 \alpha_2 x + 3 \alpha_3 x^2\), and for the last: \[\alpha_0 = - h_C^0 \,.\]

We would then write our probability of observing a spike from neuron \(C\) at time \(t\) as:

\begin{align*} \[p_C(t) = \Phi( &\int_{L_C(t)+1}^t h_{B \rightarrow C}^1(t-u) dN_B(u) \\ & \quad {} + \int_{L_C(t)+1}^t h_{C \rightarrow C}^1(t-u) dN_C(u) + h_C^0 )\, . \end{align*}

We see here a key difference between Brillinger's model and the 'rectified' Hawkes process of Chornoboy et al.:

  • the integration lower limit is set by the time/index of the last spike from the neuron of interest in the former,
  • it extends all the way back to the recording origin in the latter. Brillinger's model is a model with reset (a property that is discussed and justified by Brillinger and Segundo).


The 'insistence' of Brillinger to put the spike train analysis problem into a GLM framework is justified on two grounds:

  • the GLM theory was already fully worked out at the time–estimator convergence, estimator variance, etc.–,
  • a generic software, GLIM, was already available (and thoroughly tested) at that time to fit GLM to data and Brillinger did distribute his GLIM script–that makes him not only the most important contributor to spike train analysis from the statisticians community, but also a pioneer of reproducible research.

4. The big issue linked to the sparse sampling of the network

The 'typical' case

  • We study a network made of excitatory and inhibitory neurons:
    • \(N_E\) excitatory neurons,
    • \(N_I\) inhibitory neurons.
  • These neurons form a directed graph (digraph).



What do we observe?


A very tiny fraction of the nodes / neurons

What to we estimate?


Our association statistics (cross-correlograms, interaction kernels \(h_{j \rightarrow i}^1\), etc) lead to a functional graph.

So what is the problem with the statistics discussed so far?

  • Functional graphs are not reproducible; in a given experiment they even change when the recording conditions change (presence / absence of a stimulus).
  • Spike sorting errors are not accounted for.
  • All the knowledge we have accumulated on the micro-anatomy, the cellular physiology, etc. is ignored.
  • We need a new paradigm, I think.


Created: 2022-09-06 mar. 11:27